专利摘要:
system and method for long-range and short-range data compression. a system and method for use with flowing data blocks is provided, each of the flowing data blocks including a number of data bits. the system includes a first compressor and a second compressor. the first compressor can receive and store n number of blocks from the flowing data blocks, can receive and store a data block to be compressed from the flowing data blocks, can compress consecutive bits within the data block to be compressed based on the n blocks of fluent data blocks, it can output a matching descriptor and a literal segment. the second compressor can compress the literal segment and can output a compressed data block that includes the matching descriptor and a compressed data string based on the compressed literal segment.
公开号:BR102012002559B1
申请号:R102012002559-0
申请日:2012-02-03
公开日:2020-09-24
发明作者:Udaya Bhaskar;Chi-Jiun Su
申请人:Hughes Network Systems, Llc;
IPC主号:
专利说明:

[0001] The present invention belongs to the field of data compression techniques, in particular, lossless data compression techniques for the efficient transmission of Internet traffic over data communication links such as satellite links, and terrestrial wired or wireless links .
[0002] The analysis of Internet traffic reveals that for certain types of content, which constitute a significant portion of the total traffic, there is a high degree of redundancy in the transmitted data. This manifests itself in the form of macro-redundancies and micro-redundancies. Macro-redundancies are basically duplications of long-byte chains, which occur when the same data entities or similar data entities (typically comprising hundreds of bytes or more) are transmitted repeatedly on a link between two end points. Micro-redundancies occur due to the fine-grained syntax that underlies the byte strings, which imposes a structure so that some smaller byte patterns (typically a few bytes in length) occur more frequently than the others. Both of these types of redundancies need to be fully exploited by lossless data compression techniques to transmit data more efficiently. The benefit is the conservation of the resources of the communication link (such as channel bandwidth and energy) as well as an improvement in the user experience due to lower latency and faster response time.
[0003] Redundancies in the data stream can appear at many levels. At the highest level, an entire web page or document, which was previously transmitted, can be retransmitted in the data stream (for example, due to the user repeating the request for such an entity), at a lower level, an object within the web page (like an image belonging to an advertisement on a web page) can be relayed frequently, as it is common across multiple popular web pages; or, at the lowest level, a byte segment that was previously transmitted may reappear in the data stream. Each of these redundancies can be exploited by preventing duplicate data from being retransmitted, as long as appropriate memory and processing techniques are employed at both ends of the connection.
[0004] The range (that is, the separation in terms of the number of bytes transmitted from one instance of a byte segment to its redundant occurrence) over which redundancies occur in the data stream, can range from a few bytes to several tens or hundreds of megabytes . It is dependent on several factors, such as the type of content, link speed, user usage pattern, the number of users attached to the endpoint, etc. In addition, redundancies can be micro-redundancies, where duplications are only a few bytes in length, or much larger macro-redundancies.
[0005] Some of the common techniques for data compression on the Internet belong to the Lempel-Ziv family of compressors (LZ77, LZ78 or their derivatives, such as gzip, compress, or Hughes V.44), or more recently compressors based on grammatical transformation ( for example, Hughes Network Systems Inc., Compressor YK). The problem with these compression techniques is that they become overly complex and cease to be practical (for flow data compression applications) when your dictionary, grammar or history window size is significantly increased. These techniques can only use data within a relatively short history window (or, equivalently, a small dictionary or grammar) ranging from a few tens of kilobytes to a few megabytes in size. This means that these techniques are only able to exploit redundancies within a relatively small range of consecutive bytes, or a "window" that ranges from a few tens to a few kilobytes and a few megabytes. Since web traffic on the Internet exhibits redundancies across tens of megabytes or more, these techniques cannot be used directly to translate these long-range redundancies into compression gains.
[0006] Another important limitation of these techniques is that they cannot compress entities that have already been compressed at the source. For example, an image embedded in a web page is typically compressed (like a GIP, PNG or JPEG object). These techniques cannot compress these compressed objects. If such objects are processed by these techniques, they can effectively increase the size of the object, which is undesirable.
[0007] Another disadvantage of the LZ compressor family is that it is inherently unsuitable for using arithmetic encoding for the LZ compressor token entropy encoding in a way that fully exploits the optimality of arithmetic encoding. It is well known that arithmetic coding is the most efficient form of entropy encoder. Consequently, the performance of this type of encoder is, in general, sub-optimal. However, grammar based compressors do not have this gap. In fact, the combination of a grammar transform and arithmetic coding (ie, grammar based compression) has already been shown to perform better than the LZ77 and LZ78 compressors. Grammar-based compressors and grammar-based compressors are described in United States patent number 6400289 Bl, dated June 4, 2002, and United States patent number 6492917 Bl, dated December 10, 2002, the content which are incorporated herein by reference.
[0008] What is needed is a technique for lossless data compression to improve the efficiency of transmission of Internet traffic over communication links such as satellite links or terrestrial links by having the ability to compress entities that have already been compressed at the source, enough compressor memory (cache size). BRIEF SYNOPSIS
[0009] The present invention provides a system and method for the efficient transmission of Internet traffic over communication links, using a data compression technique that consists of a first front end stage of long range compressor and a second rear end stage of short-range compressor.
[0010] A block can be considered a chain of bytes that the compressor is able to receive at approximately the same time. For example, a relatively small compressor may be able to receive 8 bytes at once, while a relatively large compressor may be able to receive 103 bytes. In other words, the block can be defined by the capacity of the compressor.
[0011] For the purposes of discussion, consider a non-limiting example application in which the web browser data should be compressed. A web page consists of web objects, including an image object, a sound object, a text object, etc. When transmitting data from the web page, the transmitting entity works together with a compressor, where the transmitting entity knows the capacity of the compressor. Suppose, then, that the compressor has a capacity of 10 kilobytes. Also, suppose that an image object on the web page is 1 megabyte. In this situation, the transmission entity will be able to decompose the one megabyte image object into 100 10 kilobyte objects, which will be ported into the compressor. As such, in this example, the block will be 10 kilobytes.
[0012] In accordance with an aspect of the present invention, a system and method for using ported data blocks, each of the ported data blocks including a number of data bits, is provided. The system includes a first compressor and a second compressor. The first compressor can receive and store an n number of blocks of the ported data blocks, it can receive and store a data block to be compressed from the ported data blocks, it can compress consecutive bits within the data block to be compressed based on the blocks of the ported data blocks, can issue a matched descriptor and a literal segment. The matched descriptor is based on the consecutive compressed bits. The literal segment is based on the remainder of the number of bits of the data to be compressed that do not include the consecutive bits. The second compressor can compress the literal segment and can emit a compressed data block that includes the matched descriptor and a compressed data string based on the compressed literal segment.
[0013] Additional advantages and novel features of the invention are explained in part in the description that follows, and in part will become apparent to those skilled in the technology when examining the following or may be learned by practicing the invention. The advantages of the invention can be realized and achieved through the instrumentalities and combinations particularly pointed out in the attached claims. BRIEF SYNOPSIS OF DRAWINGS
[0014] The accompanying drawings, which are incorporated and form part of the specification, illustrate an exemplary version of the present invention and, together with the description, serve to explain the principles of the invention. In the drawings: Figure 1 illustrates a communication system in accordance with an aspect of the present invention. Figure 2 illustrates an example version of a communication system in accordance with an aspect of the present invention. Figure 3 illustrates an example version of a circular byte cache in accordance with an aspect of the present invention. Figure 4 illustrates an example version of a long-reach compressed block according to an aspect of the present invention. AND Figure 5 illustrates an example version of an input block in accordance with an aspect of the present invention. DETAILED DESCRIPTION
[0015] Aspects of the present invention provide a lossless data compression technique that includes a first stage having a long range compressor front end and a second stage having a short range compressor rear end. The long-range compressor retains the "long-range" of bytes previously received in an input byte stream for compression and captures macro-redundancies in the input byte stream. For example, a long-range compressor can store copies of the last 109 bytes of data that have been ported. As such, a current data stream can be compared to all 109 bytes stored for any similar bit sequence (redundancies). The primary role of this stage is to provide the compressor with access to a large history of previously transmitted data (that is, a large buffer of several tens or hundreds of megabytes), while minimizing the processing complexity required to process the large amount of stored data . The advantage of this stage is that macro-redundancies as seen within a long history of the input byte stream can be captured with very modest processing resources.
[0016] The first stage having a front end of long range compressor is followed by a second stage having a rear end of short range compressor. In one version of the present invention, a grammar-based compressor is used, which uses a sophisticated grammar transform and adaptive arithmetic coding. However, any short-range compressor can be used.
[0017] The primary role of the second stage is to explore any residual or micro redundancy at the output of the first stage. In exemplary versions, the second stage applies a much more powerful compression technique than the first stage. Since the first stage has already eliminated long-range redundancies, the second stage can operate with a shorter history (that is, less data to be processed), without any loss in performance. In particular, the short range compressor retains a "short range" of bytes previously received in a stream of input bytes for compression and captures micro-redundancies in the stream of input bytes. For example, the long-range compressor can store copies of the last 109 bytes of data that were ported. As such, the current data byte can be compared with all 109 stored bytes, for any similar bit sequence (redundancy). In other words, the short-range compressor uses less bytes received than the long-range compressor to determine redundancies. This allows the use of techniques much more powerful than the first stage, and the combination of the two stages provides compression gain close to the optimum. The grammar transform and the adaptive arithmetic encoder used by the second stage are the key to its performance. The strategy of capturing long-range macro redundancies by an efficient first stage of simpler computing allows for a more sophisticated second stage to capture the most complex microstructural redundancies. This reduces the complexity of the overall scheme to a reasonable level, while achieving almost optimal compression gains.
[0018] To fully appreciate the benefits of aspects of the present invention, the differences between a compression scheme in non-fluent mode and the compression scheme in fluent mode should be discussed.
[0019] In a non-fluent compression scheme, compression is based only on the current data entry block and after the current block has been compressed, and the compressor status is reset (that is, the history buffer is clean). In the non-fluent compression scheme, only the redundancy within an input block can be compressed. As such, the history of previous blocks cannot be used to compress future blocks.
[0020] Consider, for example, the non-fluent compression scheme of conventional file compressors. With conventional file compressors, if two identical files are entered into the compressor, one after the other, the history of the first file will already be forgotten when the second file is entered. As a result, the overall compressed size is 2X the compressed size of a file. If the conventional file compressor is used in fluent mode, the overall compressed size will be the compressed size of a file plus a small number of bytes.
[0021] In the compression scheme in fluent mode, according to aspects of the present invention, compression is based not only on the redundancy within the current input block in process, but also on the redundancy of the blocks that have already been processed in the past. The compressor history is dynamic and "live", in which only the size of the allocated temporary memory limits the number of blocks that the compressor can remember (can use). Due to its dynamic memory over the past, a compression scheme in fluent mode according to aspects of the present invention provides significantly better compression gain than. than the compression scheme in non-fluent mode. The extent of the gain depends on the number of redundancies present in the data and the size of the history buffer allocated. In particular, if most redundancies exist between a long range of input blocks, the compression scheme in fluent mode according to aspects of the present invention will provide a much more efficient compression than that of the compression scheme in non-fluent mode .
[0022] The present invention provides a system and method for compressing a flow of data blocks in a first compression stage, compressing the compressed flow of blocks in a second compression stage, transmitting the compressed flow of two-stage blocks, decompressing the compressed data of two stages in a first stage of decompression and decompress the uncompressed flow of blocks in a second stage of decompression.
[0023] In an example version, a system is provided for use with flowing data blocks, where each of the flowing data blocks includes a number of data bits. The system includes a first compressor and a second compressor.
[0024] The first compressor can receive and store a first portion of the flowing blocks. For the purposes of discussion, assume that the first compressor receives and stores an n number of blocks from the flowing data blocks. The first compressor then receives and stores a data block to be compressed. The compressor is operated to compress consecutive bits within the data block to be compressed based on the n blocks of the flowing data blocks.
[0025] The first compressor can emit a matched descriptor and a literal segment.
[0026] The matched descriptor is based on the consecutive compressed bits. For example, for the purposes of discussion, suppose that only a portion of the data block (string of consecutive bits) to be compressed is the same as a portion (string of consecutive bits) of the first received block - the first block of the previous n blocks fluent blocks of data. By merely providing a matched descriptor, as opposed to the block portion that is similar to the first block (the effective chain of consecutive bits), the overall data size is decreased.
[0027] The literal segment is based on the remainder of the number of bits of the data to be compressed without including the consecutive bits. For example, as discussed above, suppose that only a portion of the data block (string of consecutive bits) to be compressed is the same as a portion (string of consecutive bits) of the first received block - the first block of the previous n blocks fluent blocks of data. The remainder of the data block to be compressed which is not the same as any of the first n blocks received is provided as a literal segment. These data bits are "literally" the same data bits that are entered within the system.
[0028] In an example version, a random typing computation plot and a cache are additionally included. In this example version, the first compressor additionally includes a fingerprint computing portion, a fingerprint matching portion and an output block formation portion.
[0029] The datiloscopic computation portion establishes a first window in a first block received from the flowing data blocks. For the purposes of discussion, suppose that the datiloscopic computation portion establishes a window of data bits in the first data block of the n blocks of the flowing data blocks. The fingerprint computation portion can compute a first fingerprint based on a plurality of data bits within the first window. The fingerprint of a data window is a string of bits that is much smaller in size than the original data in the window. Given the much smaller size, much less processing power will be needed to compare fingerprints than comparing the original data in the windows. For example, when trying to match a 210-byte data window to 25 other 210-byte data windows, a large amount of processing resources may be required. However, if fingerprinting is used, where, for example, a fingerprint may be a 35-bit entity, far less processing resources may be needed just to match a 25-bit data entity with 31 other data entities 25-bit. Once the fingerprint computation has been calculated, the fingerprint computation portion can then establish a second window of the data block to be compressed and calculate a second fingerprint based on a plurality of data bits within the second window. The fingerprint calculations calculated for each block are stored in a random table based on the first fingerprint and can create a second random index based on a second fingerprint.
[0030] A random function is any well-defined procedure or mathematical function that converts a large amount of data, possibly of varying size, into a small piece of data, usually a single integral that can serve as an index to an array (according to associative array) . According to aspects of the present invention, the values returned by a random function are indices for a random fingerprint table, which stores each fingerprint and its associated metadata (that is, the location of the window cache from which the fingerprint was calculated) .
[0031] Random functions are mainly used in random tables, to quickly locate a data record, given your search key. Specifically, according to aspects of the present invention, the search key is the fingerprint that the random function uses to map the search key, that is, the fingerprint, to the random index. The index gives the place where the corresponding record is to be stored. The number of possible indexes is much less than the number of possible fingerprints. Thus, random functions reduce the amount of storage area required to save fingerprints.
[0032] The fingerprint match detects whether a newly calculated fingerprint matches with any previously calculated fingerprint (corresponding to data in the cache). This is based on the random index of the new fingerprint. The fingerprint stored in that index of the random table is compared to the new fingerprint. If these two fingerprints are identical, a matching fingerprint has occurred. This indicates that an identical window of bits exists somewhere in the cache. The location of this identical window is provided by the metadata.
[0033] When a newly calculated fingerprint is found to match a previous fingerprint for the data in the cache, the matching region is expanded to the maximum possible width.
[0034] The second compressor is arranged to receive and store the matching descriptor and literal segment of the first compressor. The second compressor operates only in the literal segment and does not modify the matching descriptor. The second compressor can use its own history, dictionary, grammar or any other form of internal memory of the literals entered previously to compress the current literal segment. Any known compression technique can be used. Finally, the second compressor outputs a compressed data block that includes the matching descriptor, as passed directly from the first compressor, and a compressed data chain based on the compressed literal segment.
[0035] In an example version, the second compressor includes a parsing portion, a grammar transform portion and an adaptive arithmetic coding portion. The parsing parse successively analyzes the literal segment within the longest prefixes that matches the symbols in a grammar. The grammar is updated after each parsing. Each analyzed symbol and information pertaining to the grammar update are passed to the adaptive arithmetic encoder. The adaptive arithmetic coding portion performs the entropy coding to represent the analyzed symbol and the grammar update to produce the literal segment compression, where the entropy coding is a lossless data compression scheme that is independent of specific characteristics the middle one. The compressed block emitted from the second compressor includes the compressed matching descriptor and literal segment.
[0036] A more detailed discussion of aspects of the present invention will now be explained with reference to Figures 1 to 5.
[0037] Figure 1 illustrates a communication system 100 in accordance with an aspect of the present invention.
[0038] As illustrated in Figure 1, the communication system 100 includes a compression side 102 and a decompression side 104. The compression side 102 can transmit to the decompression side 104 through a communication link 126 having reliable transport or link layer.
[0039] The compression side 102 includes a long range compressor 106, a random table 108, a byte cache of compressor 110, a short range compressor 112 and a grammar transform portion of compressor 114. In this example version, the compressor of long range 105, random table 108, the byte cache of compressor 110, short range compressor 112 and the grammar transform portion of compressor 114 are illustrated as individual devices. However, in some versions of the present invention, at least two of the long range compressor 105, random table 108, byte cache of the compressor 110, short range compressor 112 and the grammar transform portion of the compressor 114 may be combined as one unitary device. Also, in some versions, at least one of the long range compressor 106, random table 108, byte cache of the compressor 110, short range compressor 112 and portion of the grammar transform of the compressor 114 may be contained as a utility, program, or subprogram, in any desired tangible computer-readable storage medium. In addition, operations can be incorporated by computer programs, which can exist in a variety of ways, both active and inactive. For example, they may exist as software programs comprised of program instructions in source code, object code, executable code or other formats. Any of the above may be incorporated into a tangible computer-readable storage medium, which includes storage devices. Exemplary tangible computer-readable storage media include RAM, ROM, EPROM, EEPROM, from the conventional computer system and magnetic or optical disks or tapes. Concrete examples of the above include the distribution of programs on a CD ROM or via Internet download. It should, therefore, be understood that any electronic device capable of performing the functions described above may perform those functions listed above. When information is transferred or provided over a network or other communication connection (either by hard, wireless, or a combination of hard and wireless) to a computer, the computer properly views the connection as a read storage medium by tangible computer. Thus, any such connection is appropriately called a storage medium read by a tangible computer. Combinations of the above should also be included within the scope of the storage medium read by computer.
[0040] The decompression side 104 includes a short-range decompressor 116, a grammar transform portion of decompressor 118, a long-range decompressor 120, and a byte cache of decompressor 122. In this example version, short-range decompressor 116, a grammar transform portion of decompressor 118, long-range decompressor 120 and byte cache of decompressor 122 are illustrated as individual devices. However, in some versions of the present invention, at least two of the short-range decompressor 116, grammar transform portion of the decompressor 118, long-range decompressor 120 and byte cache of the decompressor 122 may be combined as a unitary device. Also, in some versions, at least one of the short-range decompressor 116, grammar transform portion of decompressor 118, long-range decompressor 120 and byte cache of decompressor 122 may be contained as a utility, program, or subprogram, in any storage medium read by tangible computer. In addition, operations can be incorporated by computer programs, which can exist in a variety of ways, both active and inactive.
[0041] The long range compressor 106 is arranged to receive a stream of data blocks, and the example of a stream block is indicated as an input data block 124. The input data block 124 varies in length, ranging from ones a few bytes to thousands of bytes at once. Some non-limiting examples of input data block 124 are IP blocks or web objects or any other data blocks, which can be communicated via communication link 126. Long range compressor 106, random table 108 and cache compressor bytes 110 communicate with each other via signal 130.
[0042] Random table 108 receives fingerprint computed by long range compressor 106.
[0043] The random function is used to map the fingerprint to its associated random index. The random index serves as an index for random table 108, where the fingerprint and the metadata associated with that fingerprint value are stored. Random table 108 can be implemented using any known data structure.
[0044] The byte cache of the compressor 110 stores the previously received data blocks within the data block stream, which is checked against the input data block 124 for redundancy. The fingerprint metadata stored by random table 108 corresponds to the location of the fingerprint data window in the byte cache of compressor 110. The random table 108 and the byte cache of compressor 110 communicate with each other via the signal 132. The byte cache of the compressor 110 is implemented as a contiguous circular byte buffer scheme, according to one aspect of the invention, with wraparound occurring only at the block boundaries. The detailed implementation of the compressor 110 byte cache will be described later.
[0045] For the purpose of discussion, suppose that the input data block 124 contains a segment of bytes, which occurred in at least one data block received previously from the data block stream. The long range compressor 106, the random table 108 and the byte cache of the compressor 110 work together to search for duplication of a data segment (not necessarily the entire block), which has occurred previously. The long-range compressor 106 extracts characteristic data patterns, also called fingerprints, from input data block 124. A random value is calculated for each fingerprint.
[0046] The calculated random value serves as an index for the random table 108, where the fingerprint and all the metadata associated with that fingerprint are stored. The metadata of a fingerprint is basically the location index for the byte cache of the compressor 110, as it points to the location of the data (within the byte cache of the compressor 110) from which the fingerprint was calculated. The metadata is used to map a fingerprint print back to a sequence of bytes within the compressor byte cache 110. Fingerprint prints are calculated for each byte of the input data block coming 124. Based on a print selection process typing, most fingerprints are discarded and only a few are stored. In one version, fingerprints that have 'zero' in their last six least significant bits (LSB) are selected to be stored.
[0047] At a later time, if a fingerprint of the input data block 124 matches a fingerprint that is stored in random table 108, it indicates that data bytes from a previously received data block match the data bytes from the data block. input data 124. In one version, a fingerprint is calculated over the 64-byte data window size. There could be a match of more than 64 bytes of data so that the match region can be expanded to the left (bytes received less recently) and to the right (bytes received more recently). This will be described in more detail below. Typically, there could be thousands of bytes matched between a current data block and the previous data blocks, contributing to long-range compression.
[0048] A valid match indicates that a segment of bytes in the input data block 124 matches the byte segment stored in the byte cache of compressor 110. Once the valid match is found, the long-range compression of that segment of the data block of entry 124 can be performed.
[0049] The long-range compressor 106 encodes the matched segment as a matched descriptor, which contains information about the location of the matched byte segment within the input data block 124 and the length of the matched segment. Unmatched byte segments called literal segments are not compressed. Long range compressor 106 provides matching descriptors and literal segments for short range compressor 112 via signal line 134.
[0050] The short-range compressor 112 is operated to compress short-range duplications in the input data block 124, where some byte patterns occur more frequently than the others. In a non-limiting example version, a grammar-based compressor is illustrated, but any short-range compression method can be used for second stage compression.
[0051] The short-range compressor 112 receives blocks that may include multiple matching descriptors and literal segments via the 134 signal. In one version, the short-range compressor 112 is a more compact and structured form of the base compressors. in the dictionary. Dictionary-based compressors look for patterns in the byte segments and are based on the assumption that certain phrases occur more frequently than others.
[0052] In this non-limiting example version, the short-range compressor 112 communicates with the grammar transform portion of compressor 114 via a signal 136. The literal segment is analyzed in a sequence of symbols in the grammar transform portion 114. The grammar within the transform portion of grammar 114 is updated after each analysis.
[0053] In all, the compression processing applied to the input data block 124 is determined adaptively. A given segment of the input data block 124, depending on (i) the contents of the compressor 110 byte of the long range compressor 106, (ii) the grammar status of the short range compressor 112 and (iii) the length of the byte segment, it can be processed by the long range processor 106 followed by the short range compressor 112 or it can leave aside the long range compressor 106 and be compressed directly by the short range compressor 112. This is motivated by observation that when the transform portion of the compressor grammar 114 contains variables that can compactly represent the given segment of the input data block 124, the short-range compressor 112 is much more efficient than the long-range compressor 106.
[0054] Thus, whenever this condition is satisfied, it is beneficial to directly compress the given segment of the input data block 124 using the short-range compressor 112 (i.e., bypassing the long-range compressor 106). On the other hand, if the grammar transform portion of compressor 114 does not contain these variables, the given segment of input data block 124 is processed by long range compressor 106 followed by short range compressor 112. In this case, only uncompressed "literal" segments of the long-range compressor output 106 are processed by the short-range compressor 112. This adaptive compression selection mechanism provides a higher overall compression gain than always applying the long-range compressor 106 followed by short-range compressor 112, ignoring the input data or compressor states.
[0055] The design parameters of the long-range compressor 106 and the short-range compressor 112 are optimized together such that the compressor as a whole provides the best match between the compression gain and the resources necessary for the implementation of the compression, that is, the capacity of memory (RAM) and processing power (CPU) on both the server and the client ends of the network connection. The compression gain is maximized while the complexity (i.e., storage space and processing power) is maintained at reasonable levels.
[0056] The short range compressor 112 provides compressed data blocks 138, which are transmitted by the communication link 126 and received by the short range decompressor 116. It is essential that the communication link 126 provides a reliable transport or link layer to ensure that compressed data blocks 138 are delivered to the short-range decompressor 116 in the order of transmission and without errors or missing blocks.
[0057] Short-range decompressor 116 decompresses the compressed data blocks 1328 received by the communication link 126 and reproduces data blocks consisting of the wedding descriptors and literal segments. In this non-limiting example version, a gram-based decompressor is illustrated in the short-range decompressor, but any second-order short-range decompressor can be used.
[0058] Short-range decompressor 116 communicates with the grammar transform portion of decompressor 118 via a signal 140. The grammar on the decompressor side needs to be updated based on the information received by communication link 126 such that it is identical to grammar on the compression side 102, to achieve lossless decompression. Short-range decompressor 116 communicates with long-range decompressor 120 via signal 142.
[0059] Long-range decompressor 120 receives wedding descriptors and literal segments from short-range decompressor 116 and reconstructs the input data block accordingly. It communicates with the byte cache of decompressor 122 through a signal 144. The byte cache of decompressor 122 needs to be updated based on the information received by communication link 126 such that it is identical to the byte cache of compressor 110 for minimize data loss. The long-range decompressor 120 copies the matched byte segments from the byte cache of the decompressor 122 based on the information provided by signal 142. It places the uncompressed blocks in the appropriate locations along with the literal segments to complete the construction of an uncompressed block identical to the input block.
[0060] As was discussed above with reference to Figure 1, aspects of the present invention provide a data compression technique consisting of a long-range front end and a short-range back end. Details of the different elements in Figure 1 are treated below with the aid of Figure 2.
[0061] Figure 2 illustrates the example version of a communication system 200 according to an aspect of the present invention.
[0062] As shown in Figure 2, the communication system 200 includes a compression side 202 and a decompression side 204. The compression side 202 includes a long-range compression portion 206 and a short-range compression portion 208.
[0063] The long-range compression portion 206 includes the long-range compressor 106 (shown by a dotted region), random table 108, and the byte cache of compressor 110 similar to Figure 1. However, communication between different elements of the portion long-range compressors 206 and their operation are explained in detail with reference to Figure 2. The short-range compression portion 208 still includes the short-range compressor 112, and the grammar transform portion of compressor 114 similar to Figure 1 However, the communication between different elements of the short-range compressor portion 208 and its operation are explained in detail with reference to Figure 2.
[0064] In this example version, the long-range compressor 106 includes a fingerprint generator 214, a detector and expander from the matching region 216, a block compressor 218, a data update portion 220 and a linked history list of literals 222 In this illustration, each of the fingerprint generator 214, detector and expander of the wedding region 216, block compressor 218, data update portion 220, linked list of literals 222, random table 108 and cache bytes of the compressor 110 can be combined as a unitary device. Also, in some versions, at least one of the fingerprint generator 214, detector and expander of the wedding region 216, block compressor 218, data update portion 220, linked list of literals 222, random table 108 and cache number of bytes of the compressor 110 may be contained as a utility, program, or subprogram, in any desired tangible computer-readable storage medium. In addition, operations can be incorporated by computer programs, which can exist in a variety of ways, both active and inactive.
[0065] In this example version, the short range compressor 112 also includes a byte sequence analyzer 224, a grammar update portion 226 and an adaptive arithmetic encoder 228. In this illustration, each of the byte sequence analyzer 224, update portion grammar 226, adaptive arithmetic encoder 228 and grammar transform portion of compressor 114 are illustrated as separate devices. However, at least one of the byte sequence analyzer 224, grammar update portion 226, adaptive arithmetic encoding 228 and grammar transform portion of compressor 114 may be combined as a unitary device. Also, in some versions, at least one of the byte sequence analyzer 224, grammar update portion 226, adaptive arithmetic encoder 228 and compressor grammar transform portion 114 may be contained as a utility, program, or subprogram, in any desired tangible computer-readable storage medium. In addition, operations can be incorporated by computer programs, which can exist in a variety of ways, both active and inactive.
[0066] The decompression side 204 further includes a short-range decompression portion 210 and a long-range decompression portion 212. The compression side 202 and the decompression side 210 communicate with each other via the communication link 126 having a reliable transport or link layer.
[0067] The short range decompression portion 210 includes the short range compressor 116 (as shown by the dotted region), and the grammar transform portion of decompressor 118 similar to Figure 1. However, communication between elements other than the decompression portion short-range 210 and its operation are explained in detail with reference to Figure 2. In this version, a decompression based on grammar is used. However, any short-range decompressor can be used instead.
[0068] In this version, the short-range decompressor 116 includes an adaptive arithmetic decoder 230, a grammar update portion 232 and a byte sequence assembler 234 and the grammar transform portion of decompressor 118 are illustrated as separate devices. However, in other versions, at least two of the adaptive arithmetic decoder 230, the grammar update portion 232 and the byte sequence assembler 234 and the grammar transform portion of decompressor 118 may be combined as a unitary device. Also, in some versions, at least one of the adaptive arithmetic decoder 230, the grammar update portion 232 and the byte sequence assembler 234 and the grammar transform portion of decompressor 118 may be contained as a utility, program, or subprogram in any desired tangible computer-readable storage medium. In addition, operations can be incorporated by computer programs, which can exist in a variety of ways, both active and inactive.
[0069] In this version, the long-range decompressor 120 includes a data recovery portion 236, an output block assembler 238 and a cache update portion 240. In this illustration, each of the data recovery portion 236, data assembler output block 238, cache update portion 240 and decompressor byte cache 122 are illustrated as separate devices. However, in other versions, at least two of the data recovery portion 236, output block assembler 238, update portion of cache3 240 and byte cache of decompressor 122 may be combined as a unitary device. Also, in some versions, at least one of the data recovery portion 236, output block assembler 238, update portion of cache 240, and byte cache of decompressor 122 may be contained as a utility, program, or subprogram, on any desired tangible computer-readable storage medium. In addition, operations can be incorporated by computer programs, which can exist in a variety of ways, both active and inactive.
[0070] Now focusing on the compression side 202, the typewriter print generator 214 is willing to receive a data stream that includes the sequence of contiguous blocks of data, which needs to be compressed, like the input data block 124. In one version, the input data block 124 is a stream of bytes comprising Internet traffic. The block size is variable and depends on the layer in which compression is applied to the network stack. For example, at the IP layer, the blocks can be IP pockets, or at the application layer, the blocks can be segments of HTTP objects.
[0071] Upon entering the data into the input data block 124, the fingerprint generator 214 calculates a fingerprint for each byte of data based on a quick sliding window. In one version, a recursively calculated Rabin fingerprint is used to minimize complexity, but any polynomial calculation scheme known to generate a fingerprint can be used. In one version, the fingerprint window is a 64-byte window. Each fingerprint is a compact characterization of the sequence of bytes within its fingerprint window. If any two fingerprints match, the sequences of bytes within the corresponding windows will be identical with high probability. Thus, duplicate byte sequences can be detected by comparing their fingerprint values instead of a byte-oriented comparison.
[0072] The fingerprint is calculated for each byte of the input data block 124. The calculated fingerprint needs to be recorded when the input block is added to the compressor byte cache 110 after the block compression is completed. Since the cache sizes can be large, it would not be practical to store all the fingerprint calculations calculated for each byte of the entire block. As such, a random system is used to reduce the number of fingerprints recorded in accordance with some versions of the present invention.
[0073] Consider the example of the situation where, for 228 bytes of cache, there could be 228 possible fingerprint prints, one for each possible distinct bit stream within the byte cache of compressor 110. In this example, consider that in the input data block 124 , only one in 64 fingerprints is retained. Therefore, as opposed to providing enough memory to store the possible 228 fingerprints, only enough memory is needed to store 222 fingerprints. This would reduce the storage space required to store fingerprint prints and fingerprint metadata by a factor of 64.
[0074] Thus, a fingerprint selection process is used to discard most fingerprint prints and select only a small subset for storage. The key requirement for the selection criterion is that it must be position independent, for example, if two fingerprint windows, in two different positions in the input data block 124, have identical data, the result of the selection criterion must be the same for both fingerprints. To satisfy these requirements, in an example version, the fingerprint generator 214 uses a criterion that selects only fingerprint prints that have their last least significant y bits as zero, where y is an integral number. If the underlying data is random, this results in random sampling of the calculated fingerprints. The number of fingerprints selected is reduced by a factor of about 27 in relation to the total numbers of fingerprints calculated.
[0075] As discussed above, the fingerprint generator 214 calculates and selects fingerprint prints for input data block 124. Before further discussing how the selected fingerprints are stored in random table 108, random table 108 and cache operations compressor bytes 110 will now be discussed in detail.
[0076] The random table 108 is used to efficiently store the selected fingerprints of the data in the byte cache of the compressor 110 and also to quickly find possible matches against the fingerprints calculated for the input data block 124. The random table 108 communicates with the byte cache of the compressor 110 through a signal 252. Each record in the random table 108 contains the value of the fingerprint and a metadata associated with that fingerprint. The metadata of a fingerprint is basically an index within the compressor byte cache 110, and serves to point to the data from which it was calculated. The metadata is used to map the fingerprint back to a sequence of bytes within the byte cache of compressor 110.
[0077] The fingerprint prints for the data in the byte cache of the compressor 110, previously calculated by an identical procedure, are maintained in the random table 108 as previously described. The fingerprints selected for the input data block 124 are compared against the fingerprints for the data in the byte cache of the compressor 110, i.e., the fingerprints corresponding to the blocks previously received within the flow of data blocks. As discussed above, if there is a match between an input fingerprint and any of the cached fingerprints, this is indicative of a possible match between the input byte sequence of input data block 124 in the fingerprint window and the sequence of bytes in the byte cache of compressor 110. This match needs to be further verified to eliminate (i) the possibility that the fingerprint in the byte cache is rancid, its data is no longer kept in the compressor's byte cache 110 (as it was overwritten by newer data), and (ii) collisions of fingerprints, in which two sequences of different bytes result in the same fingerprint value. Once these possibilities are eliminated, this indicates a valid match that can form the basis of compression of that segment of the input data block 124. The random table 108 and the byte cache of the compressor 110 receive a data update signal 248 of the parcel data update 220, which will be described later.
[0078] The detector and expander of the matching region 216 communicates with the random table 108 and the byte cache of the compressor 110 through a signal 250. The detector and expander of the matching region 216 compares the fingerprint generated by the fingerprint generator 214 for input data block 124 with the previously calculated fingerprints stored in random table 108 associated with data stored in the byte cache of compressor 110. If the input fingerprint matches the fingerprint in random table 108, a match it may exist between the fingerprint window of the input data block 124 and that of the byte cache of the compressor 110. Note that the length of the match is at least the length of the fingerprint window, but it may be longer. In one version, the fingerprint window is 64 bytes long. Larger marriages lead to higher compression gains. To detect possibly longer weddings, the wedding region is expanded as much as possible both before and after the two married fingerprint windows. Typically, the matching region could be expanded to thousands of bytes for long-range compression.
[0079] The metadata of the fingerprint indicates the location of the fingerprint window in the 110 byte cache of the compressor. Each input byte to the left of the fingerprint window, starting with the first byte to the left of the fingerprint window, is compared against the corresponding cached byte in the byte cache of compressor 110. If there is a match, the region of the match expands by one byte to the left. This process continues to expand the region of the match, byte-by-byte to the left of the fingerprint windows until (i) there is no match, or (ii) the start of input data block 124 is reached, or (iii ) the beginning of the byte cache of compressor 110 is reached, whichever comes first. Similarly, the match region is also expanded to the right of the fingerprint window until (i) there is no match, or (ii) the end of input data block 124 is reached, or (iii) the end the byte cache of compressor 110 is reached, whichever comes first. After this expansion process was completed, a match was detected between a segment of bytes (at least as long as the width of the fingerprint window, but possibly much longer) of the input data block 124 and a segment of stored bytes in the byte cache of compressor 110.
[0080] Once the detector and expander of the matching region 216 identifies an expanded matching segment in the input data block 124, it provides the relevant information for the block compressor 218 through a signal 244 for compression and also for the data update 220. The data update portion 220 communicates with the random table 108 and the byte cache of the compressor 110 via signal 248. The data update portion 220 updates the byte cache of the compressor 110 with the content of input data block 124 for future marriages.
[0081] In a non-limiting example version, the compressor byte cache 110 is implemented as a contiguous circular byte buffer scheme, with the overlap occurring only at the block boundaries. When a new block of input data 124 is added to the byte cache of compressor 110, it writes over the older data in the byte cache of compressor 110. If an entire block of input data 124 cannot fit at the end of the compressor 110 byte cache, overlap occurs and the entire input data block 124 is added to the beginning of the compressor 110 byte cache. This ensures that input data block 124 is not subdivided during overlap at the end of the compressor 110 byte cache. This considerably simplifies cache management, expanding matching regions, and checking for rancid fingerprints. The simplicity provided for verifying fingerprint prints also means that the size of the fingerprint metadata that needs to be stored is much smaller, reducing the complexity of storage. Contiguous storage also allows for the expansion of marriage regions across block boundaries (cached), leading to longer marriages and improved compression gain. Details of the implementation of the 110 byte cache will now be described in greater detail with reference to Figure 3.
[0082] Figure 3 illustrates an example version of a 300 byte circular cache according to an aspect of the present invention.
[0083] As illustrated in the Figure, the circular byte cache 300, with a maximum cache size of 302, includes a plurality of segments, a sample labeled as segments 304, 306, 308, 310, 312 and 314.
[0084] Segment 304 contains the oldest block in the circular byte cache 300, which is close to being overwritten by the new block, indicated by a location 316. Segment 314 is the most recent block, which was recorded in the circular byte cache 300. Block 310 includes a region 318, a region 320 and a region 322. Region 318 corresponds to a fingerprint window of a cached fingerprint that matches the fingerprint in the current input block and is detected by the detector and expander of the matching region 216. So, in this example, a consecutive data string from the most recent input block, block 314, matches the consecutive data chain within region 318. As there is a match, the region is expanded in addition to the initial window to the left (most recently received data) and to the right (least recently received data). Region 320 corresponds to the expansion by the detector and expander of the matching region 216 to the right. Region 322 corresponds to a window created by the detector and expander of the marriage region 216 to the left. Once the total match is determined, the match position of the byte cache within the circular byte cache 300 is known. The indentation of the byte cache 324 indicates the beginning of the expanded matching region that matches the segment in the most recent input block, block 314, while the total matching length is represented by the double arrows 326.
[0085] The circular byte cache 300 is implemented as a buffer of contiguous circular bytes, with the overlap occurring only at the block boundaries, instead of breaking the block across the cache boundaries. When a new input block is added to the circular 00 byte cache, it writes over the oldest data in the cache. If an entire input block cannot fit at the end of the circular byte cache 300, overlap occurs and the entire block is added to the beginning of the circular byte cache 300.
[0086] For example, if a new block is too large to fit between the next insertion position 316 and the last valid byte position 328, then instead of subdividing the block across the borders of the cache, it is added to the beginning of segment 308.
[0087] The implementation of the circular byte cache 300 as a contiguous circular byte buffer, considerably simplifies the management of the cache, the expansion of the matching regions and the verification of rancid fingerprints. The simplicity provided for verifying fingerprints also means that the size of the fingerprint metadata that needs to be stored is much smaller, reducing the complexity of storage. Contiguous storage also allows for the expansion of marriage regions across block boundaries (cached), leading to longer marriages and enhancing the compression gain.
[0088] The byte cache of the compressor 110 and the byte cache of the decompressor 122 are example versions of the circular byte cache 300, according to aspects of the invention. The implementation of the circular byte cache 300 as a buffer of contiguous circular bytes, overlapping only at the block boundaries, has a number of advantages over schemes based on generic circular memory or based on the block. Contiguous byte storage translates to less storage space spent compared to block-based storage. Contiguous storage also allows for the expansion of matching regions across block boundaries (in cache), which is not possible when caching in terms of blocks. The block-based cache typically requires the use of an absolute linear block index to detect rancid fingerprints. This type of indexing has two problems: (i) the index, which is several bytes in length, needs to be stored as part of the fingerprint metadata, increasing the complexity of the storage, and (ii) when the linear index is ultimately overlaps, this event needs to be detected and appropriate measures taken, which introduces complexity. In contrast, the circular byte buffer proposed here overcomes these problems, uses less storage space, is simpler to implement and also improves the compression gain.
[0089] Along with updating the byte cache of compressor 110, data update portion 220 also updates random table 108 with the fingerprints selected for input data block 124 along with the metadata. Note that the metadata corresponds to the input data block 124 that has just been inserted into the byte cache of compressor 110. Given the fingerprint value to be stored in random table 108, a random function is used to calculate an index for the location in random table 108. An attempt is made to insert the fingerprint within the random location. Any matching fingerprint, regardless of whether it is a valid, rancid, fingerprint or having a fingerprint collision, simply writes over the metadata in the groove. This ensures that the metadata entry in the random table for a fingerprint always points to the newest occurrence of a byte segment in the 110 byte cache. An unmatched fingerprint is successfully inserted only if the groove is not occupied or it contains a rancid fingerprint. Even if a large number of fingerprint prints are no longer inserted, this does not detract from performance, as explained below.
[0090] As new data is inserted into the compressor 110 byte cache, it writes over the older data. However, random table 108 may still contain fingerprints that correspond to the data recorded above. These rancid fingerprints are only deleted based on need; that is, if a new fingerprint needs to be inserted into the space occupied by a rancid fingerprint. Rancid fingerprint is detected by recalculating the value of the fingerprint printer using the data pointed out by the metadata. If the recalculated fingerprint does not match the stored fingerprint, this indicates that the fingerprint has become rancid, that is, the data from which it was calculated has since been overwritten by new input data. This rancid fingerprint can be engraved over the top by the fingerprint to be inserted. This rancid fingerprint detection approach considerably reduces the amount of storage required to contain the metadata and also simplifies the implementation of the compressor 110 byte cache by avoiding the need for absolute indexing.
[0091] The degree to which random insertion failures occur depends on the load factor of random table 108 (that is, the number of spaces in the random table divided by the number of fingerprints in the byte cache that need to be inserted into random table 108) as well as the random function used. It is desirable to keep the load factor low in order to minimize the storage complexity of the random table 108. On the other hand, if this load factor is too small, random collisions occur, that is, cases in which a fingerprint cannot be inserted because its space is occupied by a different typewriting impression. If a fingerprint is not inserted, a potential duplication of data within the fingerprint window cannot be detected, resulting in loss of the compression gain.
[0092] Therefore, the design of random table 108 is a compromise between storage complexity and performance. It is possible to alleviate this by using multiple random functions. However, it was found that for the purposes of long-range compression, it is possible to tolerate relatively high rates of random collision and measures such as bucketed hashing and multiple random functions were not critical. This is because the typical wedding region is much longer than the fingerprint window. Consequently, the region of marriage contributes a number of selected fingerprints. Even if some of the selected fingerprints are no longer inserted, as long as the other fingerprint (even a single one) is successfully inserted, the entire matching region will be detected. The key contributor is the expansion of the wedding regions once the wedding of the fingerprint has been found.
[0093] Returning to Figure 2, the block compressor 218 receives the input data block 124 along with information for the matched segment of the detector and expander from the 216 matching region. The block compressor 218 is operated to effect long-range compression. of the married segment and also determine more bytes need to be passed on as literal segments for the short range compression portion 208. However, under certain conditions, the encoding of the expanded married segment of the input data block 124 within a descriptor marriage may not be the most efficient strategy. It may be more efficient to do short-range compression instead of long-range compression for certain segments of the data blocks. This is explained further with reference to the linked literal history list 222.
[0094] If the short-range compressor 112 was used to compress a previous occurrence of an identical byte segment (or a byte segment that contains the current byte segment), the short-range compressor 112 is more likely to be more efficient than the compressor long-range 106 for that segment. This determination also needs to take into account the length of that segment, as longer segments are an exception to this rule. To make this determination, the long range compressor 106 maintains a list of literal segment descriptors in the linked literal history list 222 that have been passed on to short range compressor 112. When an expanded matching segment is identified in the block of input data 124, with its length exceeding the minimum length limit, the linked literal history list 222 is checked to see if it is contained in the list. If the segment is in the linked literal history list 222, then that segment is not compressed into a marriage descriptor; instead, it is passed directly in literal form to the short-range compressor 112 for compression. If the segment is not in the linked literal history list 222, then that segment is compressed by block compressor 218. Block compressor 218 communicates with the linked literal history list 222 via a signal 246. The update the linked list of literals history 222 with reference to grammatical updating is explained further in the context of the short range compressor 112.
[0095] The block compressor 218 is operated to compress the expanded matched segment of the input data block 124 by replacing it entirely with a "matching descriptor" containing (i) the position of the starting byte of the matching in the byte cache of compressor 110 , (ii) the position of the initial byte of the match in the input data block 124 and (iii) the length of the match. Since the matching descriptor can only be a few bytes in length, whereas matching segments can be several tens of bytes or even a larger number of bytes, significant compression gains can be achieved. The matching descriptor is all the information needed by the long-range decompressor 120 to extract the byte segment from the byte cache of the decompressor 122, so that the input data block 124 can be reconstructed accurately.
[0096] In certain cases, the input data block 124 may contain zeros or more of these match regions, mixed with "literal" regions, for which no match was available in the 110 byte cache. Each match region is replaced by a matching descriptor and literal bytes are preserved exactly and passed on to the short-range compressor 112 for the second stage of compression.
[0097] The block compressor 218 provides a long-range compressed block for each input data block 124 processed to the short-range compressor 112 via a signal 134. The long-range compressed block includes information about the block length, the wedding count, wedding descriptors, and literal byte segments. The shape of the long-reach compressed block will be discussed in detail with the aid of Figure 4.
[0098] Figure 4 illustrates an example version of a long range compressed block 400 according to an aspect of the present invention.
[0099] As shown in the Figure, the long range compressed block 400 includes a header field of block 402, a plurality of header fields for the wedding descriptor (examples shown as the header field of the wedding descriptor 404, the header field of the wedding descriptor 406, the header field of the wedding descriptor 408) and a field of literal byte segments 410.
[0100] The header field of block 402 further includes a block length field 412 and a match count field 414. The match count field 414 indicates the total number of match segments that were found in the input data block 124 .
[0101] Each header field of the match descriptor includes a byte cache indent field, an input block indent field, and a match length field. For example, the header field of the wedding descriptor 406 still includes a byte cache indent field 416, an input block indentation field 418, and a length field of the wedding 420. Note that all header fields in the wedding descriptor 1,2, ... M have the same format as the header field of wedding descriptor 406 even though only the header field of wedding descriptor 406 is shown here in expanded form.
[0102] The byte cache indentation field 416 corresponds to the indentation of byte cache 324 of the circular byte cache 300 of Figure 3. In particular, the indentation field of byte cache 416 indicates the location of the indentation with respect to the beginning of the cache. compressor byte number 110, where the match was found. The indentation field of the input block 418 indicates the indentation byte with respect to the beginning of the input data block 124, where the match was found. The match length field 420 indicates the length of the matched segment in bytes.
[0103] The marriage count field 414 and the marriage descriptive fields 416, 418 and 420 can be compressed using a variable length code. Each of these entities can be encoded using the least significant 7 bits of one or more bytes, with the most significant bits serving as "continuation bits". If the entity is small enough to be encoded using the least significant 7 bits of all the bytes used so far, the most significant bit is set to zero. Having the most significant bit set to zero indicates that the byte is the last byte used in encoding the entity. Having the most significant bit set to 1 means that the next byte was also used in the entity's encoding and the decoding must continue until a byte with zero at its most significant is found. Marriage counts, indentations, and wedding lengths tend to be small values most of the time, but can occasionally take on large values. The variable-length scheme provides significant savings in representing these values.
[0104] Figure 5 illustrates an example version of an input block 500 according to an aspect of the present invention. The input block 500 corresponds to a data input block within the block compressor 218 of the detector and expander of the matching region 216.
[0105] As shown in the Figure, the input block 500 includes a segment 502, a segment 504, a segment 506, a segment 508 and a segment 510. In one version, an indentation of the input block 512 indicates the beginning of segment 504, segment 506 indicates a fingerprint window and segments 504, 506 and 508 together formed the expanded matching region equivalent to the length of matching 420. Input block indentation 512 corresponds to input block indentation field 418 of the block long-range tablet 400 of Figure 4. Segment 506 corresponds to the window created by the fingerprint generator portion 214 and, additionally, corresponds to region 318 of the circular byte cache 300 of Figure 3. Segment 506 was compared with a window similar in size to previous bytes and found to match. The marriage region, segment 506, was then extended in the left direction until there are no more consecutive matching bits. This region of the extended match, segment 504, corresponds to region 320 of the circular byte cache 300 of Figure 3. The region of the match, segment 506, has been further extended in the right direction until there are no more consecutive matching bits. This extended matching region, segment 508, corresponds to region 322 of the circular byte cache 300 of Figure 3.
[0106] In other words, there is a match in input block 500 starting at the location of byte 512 with a segment in a byte cache, where the length of the match corresponds to an expanded match region indicated by the double arrows 514.
[0107] Block compressor 218 creates a match descriptor header for each match segment found in input data block 124. If no match segment was found then there is no match descriptor header and match count field 414 is zero.
[0108] The literal byte segment field 410 contains the unmatched bytes in the input data block 124, in exactly the same order of occurrence in the input data block 124. If all the bytes in the input data block 124 were matched with a or more segments in the byte cache of compressor 110, the literal byte segment field 410 is empty, that is, it has zero bytes.
[0109] Returning to Figure 2, the output of the block compressor 218 is received by the short range compression portion 208. The byte sequence analyzer 224 is operated to receive signal 134 from the block compressor 218 and a signal 256 of the transform portion of grammar from compressor 114. The short-range compressor 112 uses the byte sequence analyzer 224 to know the longest prefix of the new data it has received which is already represented by an existing grammar symbol. The byte sequence analyzer 224 analyzes the input byte sequence in signal 134 based on the grammar symbols in the grammar transform portion of compressor 114. Once the byte sequence analyzer 224 has finished the analysis for each symbol grammar, it communicates with the grammar update portion 226 via a signal 254 to update the grammar when possibly adding a new symbol, or modifying an existing symbol.
[0110] According to another aspect of the present invention, the short-range compressor 112 may provide a feedback signal to the long-range compressor 106 to affect the operation of the long-range compressor 106. An example version of this aspect will now be described in greater detail. .
[0111] The grammar update portion 226 also keeps track of when the grammar in the grammar transform portion of compressor 114 needs to be re-fixed. It provides a signal 260 to the grammar transform portion of compressor 114 to initialize the grammar. In this version of the short-range compressor 112, signal 260 is also fed to reset the linked literal history list 222. Therefore, the linked literal history list 222 is reset whenever the grammar is initialized, and thus contains only the literals since the most recent grammar started. This means that the grammar for the short-range compressor 112 has variables that can represent in a compact way the future occurrences of these literal segments.
[0112] When an expanded matching segment is identified in the input data block 124, with its length not exceeding a maximum length limit, the list of literals is checked to see if it is contained in the list. If this is true, then that segment is not compressed within a marriage descriptor; instead, it is passed directly in literal form to the short-range compressor 112 for compression. If this is not true, such a matching segment is compressed by the long range compressor 106 as described above. Note that the selective compression strategy does not require any indication of this option to be passed to decompression portion 204.
[0113] Adaptive arithmetic encoder 228 maps the sequence of symbols received from the byte sequence analyzer 224 into bits. It is based on the assumption that certain grammar symbols occur more frequently than others. The adaptation allows updating the tables that keep track of the frequency of occurrence for incoming symbols while processing the data, which improves the compression rate of the encoders. The adaptive arithmetic encoder 228 follows the entropy coding technique, which suggests that the symbols most likely to occur more frequently can be represented using fewer bits. When a sequence is processed by arithmetic encoding, frequently used symbols are represented with fewer bits and symbols not used as often are represented with more bits, resulting in an overall reduction in the number of bits used. Adaptive arithmetic encoder 228 provides an efficiently compressed and encoded output 138 ready for transmission.
[0114] The output of the short-range compressor 112 is transmitted by the communication link 126. It is essential that the communication link 126 provides a reliable transport or link layer to ensure that the compressed blocks 138 are delivered to the decompression portion 204 in the transmission order. and no errors or missing blocks. The short-range decompression portion 210 performs the inverse operation of the short-range compression portion 208, to reproduce blocks consisting of marriage descriptors and literal segments of the compressed blocks 138.
[0115] The adaptive arithmetic decoder 230 receives the compressed block 138 of the communication link 126, which were encoded by the adaptive arithmetic encoder 228. To decode the bits back to the symbols such that the decoded symbols match exactly the symbols encoded on the compression side 202, the frequency tables in the adaptive arithmetic decoder 230 must be updated in the same way and in the same step as in the adaptive arithmetic encoder 228. The adaptive arithmetic decoder 230 provides decoded symbols 262 for the grammar transform portion of decompressor 118.
[0116] The grammar transform portion of decompressor 118 works with grammar update portion 232 to provide the uncompressed grammar transform of symbols to the byte sequence assembler byte 234. Note that the short range decompressor 116 needs to be aware of the transformations and grammar updates on the short-range compressor side 112 such that the grammars on both the compressor and decompressor side are identical, to retrieve the original input data block 124.
[0117] The byte sequence assembler 234 receives a signal 264 from the grammar transform portion of decompressor 118 and is operated to assemble the bytes into the appropriate format of the uncompressed block 142, which includes marriage descriptors and literal segments. The format of the uncompressed block 142, which is identical to the compressed block 134, will be further explained with the aid of Figure 4. The byte sequence assembler 234 updates the grammar update portion 232 by adding any new symbols using signal 266.
[0118] The short-range decompression portion 210 provides the uncompressed block 142, which includes wedding descriptors and literal segments, to the long-range decompression portion 212. The long-range decompression portion 212 performs the reverse operation of the long-range compressed portion 206, to reconstruct the input data block 124 based on uncompressed marriage descriptors and literal segments.
[0119] The long-range decompression portion 212 includes the long-range decompressor 120 (as shown by the dotted line), and the byte cache of decompressor4 122 similar to Figure 1. However, communication between different elements of the long-range decompression portion reach 212 and its operation are explained in detail with reference to Figure 2.
[0120] The data retrieval portion 236 is operated to receive uncompressed marriage descriptors and literal segments from uncompressed block 142. Based on the format of uncompressed block 142, as discussed in Figure 4, it separates marriage descriptors and literal segments. The data recovery portion 236 provides the wedding descriptors 270 for the decompressor 122 byte cache, which indicates the number of bytes that need to be picked up and the byte segment start address in the decompressor 122 byte cache. data recovery 236 provides literal segments 272 for output block assembler 238.
[0121] The decompressor byte cache 122 picks up matching segments based on the starting address and matching length provided in the matching descriptor and provides matching segments 274 for output block assembler 238. Note that the long range decompression portion 212 you need to be aware of the updates in the byte cache of compressor 110 such that the contents of the cache on both the compressor side and the decompressor side are identical, to retrieve the original input data block 124. The decompressor byte cache 122 also receives a signal 278 from the cache update portion 240 to add the byte segments that have been uncompressed.
[0122] Output block assembler 238 reconstructs input data block 124 based on literal segments 272 received from data retrieval portion 236 and matched segments 274 received from decompressor byte cache 122. A block header, illustrated in Figure 4 later, indicates the number of matching descriptors contained in the compressed block 138 received from the compressor portion 202. Each matching descriptor specifies where the matching bytes are in the byte cache of decompressor 122, the matching length and the location of the matching segment. match in the uncompressed block 142. The output block assembler 238 simply needs to construct the matched portion of the block by simply copying the matched byte segments 274 from the byte cache of the decompressor 122 and placing them in the correct locations of the uncompressed block. This can possibly leave gaps not filled in the uncompressed block, corresponding to the literal segments. Each unfilled gap can then be filled using the literal segment 272, as these bytes occur in exactly the same order as they appeared in the input data block 124. This completes the construction of a decompressed block 276 identical to the input data block 124 .
[0123] Similar to data update portion 220 in long-range compression portion 206, cache update portion 240 in long-range decompression portion 212 adds uncompressed block 276 to the byte cache of decompressor 122 to write over bytes Older. This is done to make sure that the byte cache of the updated decompressor 122 is identical to the byte cache of the compressor 110 so that the future input data block 124 is correctly decompressed.
[0124] Selective compression of the input data block 124, depending on the characteristics of the input, the contents of the byte cache of the compressor 110 and the state of the grammar, results in improved compressor gain over schemes that process each input segment by the same steps of processing of short and / or long range compression.
[0125] The joint optimization of long-range and short-range compression is an advantage over techniques that apply only long-term compression or only short-term compression or apply both independently such that they are not aware of each other. There is a significant degree of interdependence between the performances of the two stages of compression. Consequently, it is important to optimize the design parameters of the long range compressor 106 taking into account the behavior of the short range compressor 112.
[0126] Extensive parametric studies have been conducted to determine the optimal parameters such as the minimum matching length, the length of the fingerprint window, the rate of selection of the fingerprint, the size of the byte cache and the size of the grammar. The compression gain of only the long-term compression portion 206 increases as the size of the minimum match segment is reduced, as smaller matches can be detected and compressed. However, this reduces the performance of the short range compressor 112 to the degree that the overall compression gain deteriorates with the reduction of the minimum matching length. The reason for this behavior is that the smaller matches interrupt the continuity of the byte sequence at the output of the 112 short-range compressor (that is, many smaller literal segments). This makes it more difficult for the transform portion of the compressor grammar 114 to find the underlying structure. Therefore, it is preferable to use a higher value for the minimum matching length, such that the overall compression gain is maximized.
[0127] Formats of a long range compressed data block, the input data block and a circular byte cache will be discussed below according to aspects of the invention.
[0128] A summary of the test results that demonstrate the advantages of the compression scheme, according to aspects of the invention, is presented below. Table i presents results for two types of data showing the performance gains of the long and short range joint compression. Table 1. Summary of performance gains from long and short range joint compression.
[0129] The compressible HTTP response entity data represents data that is a subset of traffic, which is known to not contain entities that are no longer compressed at the source. This traffic is not compressed and thus can be compressed with high compression gains. This is evident from the results shown in the first row of Table 1. Results are shown for v44, which is a Lempel-Ziv type compressor with 65KB of history buffer, a grammar based compressor with 5MB of space for the grammar and a set long-range compressor (100 MB byte cache) and grammar (5 MB grammar). The compression gain for the Lempel-Ziv type v44 compressor is 3.67. The compression gain for the short range compressor based on grammar alone is 9.99. The compression gain for the joint long-range compressor and a compressor based on short-range grammar, according to aspects of the invention, is 19.11. Thus, it can be seen that although the grammar based compressor provides an improvement over the v44, the integration of a long-range compressor front end almost doubles the compression gain for this type of data.
[0130] The second line of Table 1 shows the results for HTTP response data that may contain entities that are already compressed at the source. Typically, they are embedded images (JPEG, GIF, PNG) or compressed files. As expected, the possible compression in this case is less. This is evident from the results presented in the second line. The compression gain for a Lempel-Ziv type v44 compressor is 1.05 (estimated). The compression gain for the short range compressor based on grammar alone is 1.1 (estimated). The compression gain for the joint long-range compressor and a short-range grammar based compressor, according to aspects of the invention, is 1.37. However, even in this case, the addition of a long-range compressor front end has a significant impact, providing some 35% improvement over using only short-range compression. These results are clearly indicative of the advantages that are obtained by the techniques presented according to aspects of the invention.
[0131] As discussed above with the aid of Figures 1 to 5, aspects of the present invention provide lossless data compression techniques, which provide improvement over techniques currently used for the efficient transmission of Internet traffic over communication links such as satellite links or terrestrial. The lossless data compression technique, according to one aspect of the invention, consists of two stages of compression.
[0132] A front end of a long-range compressor, based on a cache containing previously transmitted bytes, captures macro redundancies in the byte stream. The primary role of this stage is to provide the compressor with access to a large history of previously transmitted data (that is, a large buffer of several tens or hundreds of megabytes), while keeping the processing complexity necessary to exploit the large amount of data stored to achieve compression. The advantage of this stage is that macro redundancies as seen within a long history of byte flow can be captured with very modest processing resources.
[0133] The long-range compressor is designed to have a very low computational complexity, so I was able to use a large history buffer (cache) that is tens or hundreds of megabytes. As a result, it can exploit the far-reaching redundancies in Internet web traffic. Furthermore, even if the transmitted byte stream contains objects that were compressed at the source, if such objects are duplicated in the transmitted byte stream within the buffer's history of the long-range compressor, they are compressed very efficiently. The limitation in the complexity of the long-range compressor means that it cannot completely eliminate certain types of redundancies. These redundancies are eliminated by a more powerful second stage that combines grammar transform and arithmetic coding, for example, a grammar based compressor, in one version of the present invention.
[0134] The second stage is based on a grammar based compressor, which uses a sophisticated grammar transform and adaptive arithmetic coding. However, any type of short-range compressor can be used. The main role of the second stage is to explore any residual or micro redundancy at the output of the first stage by applying much more powerful compression techniques when compared to those of the first stage. Since the first stage has already eliminated long-range redundancies, the second stage can operate with a shorter history (that is, less data to be processed) with no loss in performance. This allows the use of techniques much more powerful than the first stage, and delivers an almost optimal compression gain. The grammar transform and the adaptive arithmetic encoder used by the second stage are central to its performance. The strategy of capturing long-range macro redundancies by an efficient first stage of simpler computing, allows for a more sophisticated second stage to capture the most complex structural micro redundancies. This keeps the complexity of the overall layout to a reasonable level, while achieving almost optimal compression gains.
[0135] The data compression technique, according to aspects of the invention, exploits redundancy in the input data stream at the lowest byte stream level to achieve data compression. Operating at the byte level has the advantage that this technique has a much wider applicability, as it is not aware of higher layer protocols, applications or the type of data represented by the byte flow. This allows it to be applied to any layer in the network's protocol stack. For example, it can be applied at the application layer (in byte streams comprising HTTP objects) or at the network layer in IP packets.
[0136] The data compression technique, according to aspects of the invention, will result in significant reductions in channel bandwidth and / or transmission power requirements for carrying web traffic between end points on the Internet. This basic capability can be exploited in different ways. For example, a higher number of users can be supported on a given satellite transponder bandwidth, or a given terrestrial backhaul link capacity. Alternatively, the same number of users can be served with a higher bandwidth, which reduces the perceived latency in the case of interactive applications or increases in the response time in the case of surfing acting on the web. Although the exact degree of improvement depends on the nature of the traffic and the implementation, the reduction in off-route bandwidth can be as high as 25% above the techniques currently employed.
[0137] The foregoing description of several preferred versions of the invention has been presented for purposes of illustration and description. It is not intended to be comprehensive or to limit intention to the precise forms revealed, and obviously many modifications and variations are possible in the light of the above teaching. The example versions, as described above, were chosen and described to better explain the principles of the invention and its practical application to thus allow others skilled in the technology to make better use of the invention in various versions and with various modifications as appropriate for the particular use contemplated. . It is intended that the scope of the invention is defined by the claims appended hereto.
权利要求:
Claims (13)
[0001]
System (100, 200) for compressing flowing data blocks, each of the flowing data blocks comprising a number of data bits, said system characterized by the fact that it comprises: A first compressor (106) operable for: - receive and store n number of blocks from the flowing data blocks in a compressor cache, - compress a current fluent data block by: · Compare the current flowing data block with the n flowing data blocks, determining identical segments of the current flowing data block and the n flowing data blocks as expanded matching segments, each expanded matching segment comprising a number of consecutive bits within of the current flowing data block that match a corresponding number of bits from a previous block of the n flowing data blocks, and · Generate and issue a wedding description represented each expanded wedding segment of the block being compressed, where the wedding descriptor representing each expanded wedding segment of the block being compressed, where the wedding descriptor identifies or points to a location of the segment identical in the n fluent blocks of stored data, and - outputting one or more unmatched byte segments of the current flowing data block as literal segments, where each literal segment comprises consecutive bits reflecting a remainder of the bits in the current block that were not part of any given expanded matching segments; in which the compression of each block being compressed comprises: - calculate a fingerprint in question based on a number of bits of the pad being compressed within a fingerprint window; - determining when the fingerprint in question matches a previous fingerprint associated with a respective number of bits from a previous block of the flowing data blocks; and when a fingerprint match is determined, - determine that the bits associated with the fingerprint in question match the bits in the previous data block associated with the fingerprint, and - determine the expanded matching segment to be compressed by comparing successive bits of the block being compressed in both directions outside the fingerprint window with corresponding bits from the previous block associated with the fingerprint; and a second compressor (112), operable for - compressing each of the one or more literal segments emitted by the first compressor (106) by means of a short-stage first-stage compression and a second-stage arithmetic or entropy compression, and - outputting a compressed data block including the matching descriptor received from the first compressor (106) and a compressed data chain based on the compressed literal segments.
[0002]
System, according to claim 1, characterized by the fact that: said cache comprises a contiguous circular temporary memory structure operated to store first-in / first-out blocks, said cache being operated to store a block at the beginning of said contiguous circular buffer structure when the entire block cannot be inserted contiguously into the next insertion position, and wherein a quantity of data which is usable in said contiguous circular buffer structure to find matches against the data block to be compressed is based on the position of a last valid byte at the end of said contiguous circular buffer structure.
[0003]
System according to claim 1, characterized in that said second compressor (112) comprises: an analysis portion operable to analyze each literal segment in bits segments; a portion of grammatical transformation operable to represent each bit segment analyzed as a respective symbol and compress each literal segment by representing current bit segments that match the previous bit segments with the respective symbol; and an adaptable arithmetic coding portion operable to represent the respective symbols of the corresponding bit segments as a number of bits based on the frequency of occurrence of the symbols.
[0004]
System, according to claim 1, characterized by the fact that the determination of when the fingerprint in question matches a previous fingerprint associated with a respective number of bits from a previous block of the flowing data blocks comprises: apply a hash function to the fingerprint in question to determine a number of specific locations within a fingerprint cache within which a matched fingerprint can be stored together with respective data identifying a location in the compressor cache in which the respective bits from the previous block are stored; determining that one of the specific locations in the fingerprint cache contains the previous matched fingerprint; and access the respective data stored in the fingerprint cache together with the matched fingerprint and, based on the data accessed, access the location in the compressor cache where the respective bits of the previous block are stored; in which the determination that the bits associated with the fingerprint in question match the bits in the previous data block associated with the fingerprint, and the determination of the expanded matching segment to be compressed, is performed based on the respective bits accessed from the previous block.
[0005]
System according to claim 1, characterized in that the second compressor (112) comprises a compressor based on a short-range dictionary or a compressor based on short-range grammar and a compressor for arithmetic or entropy coding, and wherein the first compressor (106) is still operable to adaptively determine whether compression of a particular expanded matching segment would be more efficient if performed by the second compressor (112) instead of the first compressor (106) based on a number of factors including the contents of the compressor cache, the length of the particular expanded matching segment, and a grammatical or dictionary state of the second compressor (112).
[0006]
System, according to claim 5, characterized by the fact that the first compressor (106) is still operable to maintain a list of literal history comprising a list of descriptors that mean the literal segments issued to the second compressor (112); and where, when an expanded match segment is determined, the first compressor (106) is operable to determine whether the expanded match segment is contained in the literal history list and where, when the expanded match segment is determined to be contained in the literal history list, the first compressor (106) is operable to output the expanded matching segment as a literal segment for compression by the second compressor (112).
[0007]
System according to claim 6, characterized by the fact that said second compressor (112) is still operable to provide a feedback signal to said first compressor (106) to trigger a reset of the literal history list when the state grammatical or dictionary definition of the second compressor (112) is fixed.
[0008]
Method for compressing flowing data blocks, each of the flowing data blocks comprising a number of data bits, said method characterized by the fact that it comprises: - receiving, through a first compressor (106), a number n of blocks of the flowing data blocks; - store the received blocks in a compressor cache; and - compress, using the first compressor (106), a current flowing data block by comparing the current flowing data block with the n stored data flowing blocks, determining identical segments of the current flowing data block and the n flowing data blocks stored as expanded matching segments, each expanded matching segment comprising a number of consecutive bits within the current flowing block that match a corresponding number of bits from a previous block of the n flowing data blocks; and - generating and outputting, through the first compressor (106), a matching descriptor representing each expanded matching segment of the block being compressed, in which the matching descriptor identifies or points to a location of the identical segment in the n fluent blocks of stored data ; and - output, through the first compressor (106), one or more unmatched byte segments of the current data block as literal segments, where each literal segment comprises consecutive bits that reflect a remainder of the block bits that were not part of no determined expanded wedding segment; in which the compression of each block being compressed comprises: · Calculate a fingerprint in question based on a number of bits in the pad being compressed within a fingerprint window; and · Determine when the fingerprint in question matches a previous fingerprint associated with a respective number of bits from a previous block of the flowing data blocks; and · When a fingerprint match is determined, determine that the bits associated with the fingerprint in question match the bits in the previous data block associated with the previous matched fingerprint, and · Determine the expanded matching segment to be compressed by comparing successive bits of the block being compressed in both directions outside of the fingerprint window with corresponding bits from the previous block associated with the fingerprint; and - compressing, through a second compressor (112), each of the one or more literal segments emitted through the first compressor (106), in which the compression of each literal segment comprises a first stage short chain compression process and a compression process of entropy coding or second stage arithmetic; and - output, through the second compressor (112), a compressed data block including the generated descriptor and output through the first compressor (106), and a compressed data chain based on the compacted literal segments.
[0009]
Method according to claim 8, characterized by the fact that the number n of blocks of the flowing data blocks are stored in a contiguous circular provisional memory structure in a first-in / first-out manner, where, when the entire block cannot be inserted contiguously at the next insertion position, the block is stored at the beginning of the contiguous circular buffer structure, and wherein a quantity of data that is used in the contiguous circular buffer structure to find matches against the data block to be compressed is based on the position of a last valid byte at the end of the contiguous circular buffer structure.
[0010]
The method of claim 8, characterized in that the first stage compression process comprises: analyzing, through an analysis portion inside the second compressor (112), each literal segment in bit segments; represent, through a portion of the grammar transform within the second compressor (112), each of the bit segments analyzed as a respective symbol, and compress each literal segment by representing current bit segments that match previous bit segments with the respective symbol; and represent, through a portion of adaptive arithmetic coding within the second compressor (112), the respective symbols of the matched bit segments as a number of bits based on the frequency of occurrence of the symbols.
[0011]
Method according to claim 8, characterized in that the first stage compression comprises a compression process based on a short-range dictionary or a compression process based on short-range grammar, and in which: the method further comprises adaptively determining, through the first compressor (106) whether compression of a particular expanded matching segment would be more efficient if performed by the second compressor (112) instead of the first compressor (106) based on a number of factors including the contents of the compressor cache, the length of the particular expanded matching segment, and a grammatical or dictionary state of the second compressor (112).
[0012]
Method, according to claim 11, characterized by the fact that it also comprises: maintain, through the first compressor (106), a list of literal history comprising a list of descriptors that signify the literal segments issued to the second compressor (112); and where, when an expanded match segment is determined, determine, using the first compressor (106), whether the expanded match segment is contained in the literal history list and where, when the expanded match segment is determined to be contained in the literal history list, output, through the first compressor (106), the expanded matching segment as a literal segment for compression by the second compressor (112).
[0013]
Method, according to claim 12, characterized by the fact that it also comprises: providing, through the second compressor (112), a feedback signal to the first compressor (106) to trigger a refixing of the literal history list when the grammatical or dictionary status of the second compressor (112) is fixed.
类似技术:
公开号 | 公开日 | 专利标题
BR102012002559B1|2020-09-24|SYSTEM AND METHOD FOR COMPRESSION OF FLUENT DATA BLOCKS
US9680500B2|2017-06-13|Staged data compression, including block level long range compression, for data streams in a communications system
US10277716B2|2019-04-30|Data compression for priority based data traffic, on an aggregate traffic level, in a multi stream communications system
US9363309B2|2016-06-07|Systems and methods for compressing packet data by predicting subsequent data
CN107210753B|2021-03-09|Lossless reduction of data by deriving data from prime data units residing in a content association filter
US6597812B1|2003-07-22|System and method for lossless data compression and decompression
AU721734B2|2000-07-13|A lempel-ziv data compression technique utilizing a dictionary pre-filled with frequent letter combinations, words and/or phrases
US8698657B2|2014-04-15|Methods and systems for compressing and decompressing data
US9203887B2|2015-12-01|Bitstream processing using coalesced buffers and delayed matching and enhanced memory writes
US8456332B2|2013-06-04|Systems and methods for compression of logical data objects for storage
US8516002B2|2013-08-20|Deflate file data optimization
EP2971721B1|2021-06-16|Data compression for priority-based data traffic, on an aggregate traffic level, in a multi-stream communications system
Xue et al.2018|An optimized data hiding scheme for deflate codes
BR102014006340B1|2022-02-01|Method and apparatus for staged data compression and decompression
Suel2019|Delta compression techniques
Skibiński et al.2008|A highly efficient XML compression scheme for the web
Yan et al.2019|Z-Dedup: A case for deduplicating compressed contents in cloud
US7612692B2|2009-11-03|Bidirectional context model for adaptive compression
Waghulde et al.2014|New Data Compression Algorithm and its Comparative Study with Existing Techniques
Afek et al.2011|Efficient processing of multi-connection compressed web traffic
FI115937B|2005-08-15|Lossless data compression and decompression
Saini et al.2014|Compressing the Data Densely by New Geflochtener to Accelerate Web
同族专利:
公开号 | 公开日
US9716734B2|2017-07-25|
US10567458B2|2020-02-18|
US20130018932A1|2013-01-17|
EP2546993A1|2013-01-16|
US20140325088A1|2014-10-30|
BR102012002559A2|2013-07-23|
EP2546993B1|2021-06-30|
US20170318066A1|2017-11-02|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US4814746A|1983-06-01|1989-03-21|International Business Machines Corporation|Data compression method|
US5166987A|1990-04-04|1992-11-24|Sony Corporation|Encoding apparatus with two stages of data compression|
US5426779A|1991-09-13|1995-06-20|Salient Software, Inc.|Method and apparatus for locating longest prior target string matching current string in buffer|
US5541995A|1994-04-18|1996-07-30|Apple Computer Inc.|Method and apparatus for decoding non-sequential data packets|
US5627534A|1995-03-23|1997-05-06|International Business Machines Corporation|Dual stage compression of bit mapped image data using refined run length and LZ compression|
US5951623A|1996-08-06|1999-09-14|Reynar; Jeffrey C.|Lempel- Ziv data compression technique utilizing a dictionary pre-filled with frequent letter combinations, words and/or phrases|
US6032113A|1996-10-02|2000-02-29|Aura Systems, Inc.|N-stage predictive feedback-based compression and decompression of spectra of stochastic data using convergent incomplete autoregressive models|
US6400289B1|2000-03-01|2002-06-04|Hughes Electronics Corporation|System and method for performing lossless data compression and decompression|
US6492917B1|2001-10-31|2002-12-10|Hughes Electronics Corporation|System and method for implementation of the YK lossless data compression algorithm using a modular computational architecture|
US6624762B1|2002-04-11|2003-09-23|Unisys Corporation|Hardware-based, LZW data compression co-processor|
US20060106870A1|2004-11-16|2006-05-18|International Business Machines Corporation|Data compression using a nested hierarchy of fixed phrase length dictionaries|
US7840774B2|2005-09-09|2010-11-23|International Business Machines Corporation|Compressibility checking avoidance|
US7840744B2|2007-01-30|2010-11-23|International Business Machines Corporation|Rank select operation between an XIO interface and a double data rate interface|
US7827237B2|2007-03-12|2010-11-02|Citrix Systems, Inc.|Systems and methods for identifying long matches of data in a compression history|
GB2450336B|2007-06-19|2009-09-23|Sony Comp Entertainment Europe|Method and apparatus for compressing and decompressing data|
US7538695B2|2007-06-29|2009-05-26|Rmi Corporation|System and method for deflate processing within a compression engine|
US8090027B2|2007-08-29|2012-01-03|Red Hat, Inc.|Data compression using an arbitrary-sized dictionary|
CN103609071B|2011-03-28|2017-04-12|思杰系统有限公司|Systems and methods for tracking application layer flow via a multi-connection intermediary device|
US9363339B2|2011-07-12|2016-06-07|Hughes Network Systems, Llc|Staged data compression, including block level long range compression, for data streams in a communications system|
US9479383B2|2011-07-12|2016-10-25|Hughes Network Systems, Llc|Data compression for priority based data traffic, on an aggregate traffic level, in a multi stream communications system|
US20130018932A1|2011-07-12|2013-01-17|Hughes Network Systems, Llc|System and method for long range and short range data compression|US9378560B2|2011-06-17|2016-06-28|Advanced Micro Devices, Inc.|Real time on-chip texture decompression using shader processors|
US9479383B2|2011-07-12|2016-10-25|Hughes Network Systems, Llc|Data compression for priority based data traffic, on an aggregate traffic level, in a multi stream communications system|
US20130018932A1|2011-07-12|2013-01-17|Hughes Network Systems, Llc|System and method for long range and short range data compression|
US9363339B2|2011-07-12|2016-06-07|Hughes Network Systems, Llc|Staged data compression, including block level long range compression, for data streams in a communications system|
US8643515B2|2011-10-28|2014-02-04|International Business Machines Corporation|Compressing data using an encoding table|
US8868520B1|2012-03-01|2014-10-21|Netapp, Inc.|System and method for removing overlapping ranges from a flat sorted data structure|
US8838826B2|2012-04-04|2014-09-16|Google Inc.|Scalable robust live streaming system|
US9351196B2|2012-08-31|2016-05-24|International Business Machines Corporation|Byte caching in wireless communication networks|
KR20140046815A|2012-10-11|2014-04-21|삼성전자주식회사|Power management integrated circuit and operating method thereof|
US10015285B2|2013-03-14|2018-07-03|Huawei Technologies Co., Ltd.|System and method for multi-stream compression and decompression|
BR102014006340B1|2013-03-15|2022-02-01|Hughes Network Systems, Llc|Method and apparatus for staged data compression and decompression|
EP2971721B1|2013-03-15|2021-06-16|Hughes Network Systems, LLC|Data compression for priority-based data traffic, on an aggregate traffic level, in a multi-stream communications system|
US9860332B2|2013-05-08|2018-01-02|Samsung Electronics Co., Ltd.|Caching architecture for packet-form in-memory object caching|
US10127236B1|2013-06-27|2018-11-13|EMC IP Holding Company|Filesystem storing file data in larger units than used for metadata|
US10229132B2|2013-07-15|2019-03-12|International Business Machines Corporation|Optimizing digest based data matching in similarity based deduplication|
US10296598B2|2013-07-15|2019-05-21|International Business Machines Corporation|Digest based data matching in similarity based deduplication|
US10339109B2|2013-07-15|2019-07-02|International Business Machines Corporation|Optimizing hash table structure for digest matching in a data deduplication system|
US9836474B2|2013-07-15|2017-12-05|International Business Machines Corporation|Data structures for digests matching in a data deduplication system|
US9594766B2|2013-07-15|2017-03-14|International Business Machines Corporation|Reducing activation of similarity search in a data deduplication system|
US10789213B2|2013-07-15|2020-09-29|International Business Machines Corporation|Calculation of digest segmentations for input data using similar data in a data deduplication system|
US20180048593A1|2015-02-17|2018-02-15|Hewlett Packard Enterprise Development Lp|Flow entry generating and packet processing based on flow entry|
US10235044B2|2015-07-27|2019-03-19|Datrium, Inc.|System and methods for storage data deduplication|
US10250278B2|2015-12-21|2019-04-02|Intel Corporation|Compression of a set of integers|
SG11201806334WA|2016-02-15|2018-08-30|Certis Cisco Security Pte Ltd|Method and system for compression and optimization of in-line and in-transit information security data|
US10091128B2|2016-05-23|2018-10-02|Hughes Network Systems, Llc|Dynamic history multistream long range compression|
US10719406B1|2016-06-23|2020-07-21|EMC IP Holding Company LLC|Enhanced fingerprint computation for de-duplicated data|
US10224957B1|2017-11-27|2019-03-05|Intel Corporation|Hash-based data matching enhanced with backward matching for data compression|
CN107920129A|2017-12-07|2018-04-17|郑州云海信息技术有限公司|A kind of method, apparatus, equipment and the cloud storage system of data storage|
US20190182160A1|2017-12-08|2019-06-13|Corsa Technology Inc.|Packet classification using fingerprint hash table|
RU2672625C1|2017-12-11|2018-11-16|федеральное государственное автономное образовательное учреждение высшего образования "Национальный исследовательский ядерный университет МИФИ" |Device for compression of data|
US10128868B1|2017-12-29|2018-11-13|Intel Corporation|Efficient dictionary for lossless compression|
US10963177B2|2018-04-30|2021-03-30|EMC IP Holding Company LLC|Deduplication using fingerprint tries|
US10812630B2|2018-11-19|2020-10-20|Fungible, Inc.|Merging techniques in data compression accelerator of a data processing unit|
US10997123B2|2018-11-19|2021-05-04|Fungible, Inc.|Matching techniques in data compression accelerator of a data processing unit|
US10727865B2|2018-11-19|2020-07-28|Fungible, Inc.|Data striping for matching techniques in data compression accelerator of a data processing unit|
法律状态:
2013-07-23| B03A| Publication of a patent application or of a certificate of addition of invention [chapter 3.1 patent gazette]|
2018-12-18| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|
2019-10-15| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|
2020-04-22| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|
2020-09-24| B16A| Patent or certificate of addition of invention granted|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 03/02/2012, OBSERVADAS AS CONDICOES LEGAIS. |
优先权:
申请号 | 申请日 | 专利标题
US13/180,969|US20130018932A1|2011-07-12|2011-07-12|System and method for long range and short range data compression|
US13/180,969|2011-07-12|
[返回顶部]